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Abstract 

A general form of linear factor analysis is defined, and presented as a method to 
factor a data matrix, similar in many respects to principal component analysis. We 
discuss necessary and sufficient conditions for solvability of the factor analysis equations 
and give a constructive method to compute all solutions. A follow up paper will present 
the corresponding algorithm. 
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Note: This is a working paper which will be expanded/updated frequently. All suggestions 
for improvement are welcome. The directory deleeuwpdx.net/pubfolders/factor has a pdf 
version, the complete Rmd file with all code chunks, and the bib file. 

1 Introduction 

Problem: Suppose X is an n x m real matrix. Suppose H is a subset of M pxp and S is a 
subset of W nxp . We want to find the solutions of the equation X = FS with the n x p 
matrix F of factor scores such that F'F is in F and the m x p matrix S of factor loadings in 

5 . 

For reference purposes we repeat our general factor analysis model. 
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X = FS', 

( 1 ) 

F'F = H, 

( 2 ) 

H enc rp x p, 

( 3 ) 

S e S c M mx G 

( 4 ) 


Note that we do not require that p < min (m,n) or even p = rank(A"). The interesting case 
is p > m, more factors than variables. There is also no requirement that n > m, by the way. 


n classical exploratory orthogonal common factor analysis we require H = I and S = 


Si 


S 9 


with the m x m matrix S 2 diagonal. The rn x (p — m ) matrix S\ has the common 
factor loadings. In confirmatory common factor analysis we keep the basic structure of S but 
we may fix some loadings at fixed and known values, mostly zero. In non-orthogonal versions 
of common factor analysis we allow for non-zero elements in H, usually to model correlations 
between common factors. 


In this paper we rederive some of the basic algebraic results for factor analysis. These results, 
in various forms and disguises, can be found in many places, starting with Wilson (1928) 
and culminating - at least for me - in Guttman (1955). I desperately needed some clarity 
after reading much of the factor indeterminacy literature, which - at least for me - has a 
light-to-heat ratio close to zero. I admire Steiger and Schonemann (1978) and Steiger (1979) 
for going through half a century of clunky notation, polemics, scientific denial, and doubtful 
results. 


Our proofs are based on powerful matrix decomposition tools, the singular value decomposition 
and the eigenvalue decomposition. This makes our proofs quite different from what one 
normally finds, especially in the older literature. Matrix algebra has come a long way. 


2 Two Steps 

Instead of directly tackling the problem of solving equations (1) — (4) we proceed in two 
steps. We first give a trivial necessary condition for solvability, which has been important 
throughout the history of factor analysis. 

Theorem 1: [Necessary] If S and F solve (1) — (2) then X'X = SHS'. 

Proof: Duh. QED 

As we shall see in a moment, the condition X'X = SHS' is also sufficient for solvability of 
(1) — (2), and consequently (1) — (4) is solvable if and only if 


( 5 ) 

( 6 ) 
( 7 ) 
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X'X = SHS', 
S eS, 

Hen. 



Of course it does not follow that conditions (5) — (7) determine S and H uniquely. Because 
of the generality of our constraints on S and H it is quite useless to look for identification 
conditions. 

The usual approach in factor analysis theory is to first solve (5) — (7) for S and H, and then 
find all F for which X = FS' and F'F = H. The fact that such an F always exists if (5) — (7) 
are solvable is sometimes called the Fundamental Theorem of Factor Analysis (Kestelman 
(1952)). The fact that there are usually multiple solutions for F to X = FS' and F'F = II 
for given S and H that solve (5) — (7) is called the Factor Score Indeterminacy Problem. 

In factor analysis computation a similar two step process is followed. First an approximate 
solution to (5) — (7) is computed, using multinormal maximum likelihood or least squares. 
This gives an S e S and an H e TL. Then, in the second step, we compute an approximate 
solution to X = FS' and F'F = H, using S and H from the first step. This is sometimes 
called Factor Score Estimation. 


3 Least Squares 

In this paper we study only the second step only. We assume we have some S and H that 
may or may not satisfy (5) — (7). Then find an expression for all F with F'F = H that 
minimize 

a(F) = tr (X - FS')'(X - FS'). (8) 

This is “factor score estimation”. We then look into conditions for which the minimum of the 
least squares loss function a is zero. Those conditions give us the “fundamental theorem of 
factor analysis”. And finally if the minimum of a is zero the set of all solutions defines the 
“factor score indeterminacy problem”. 

Lemma 1: [Crossproduct] If the p x p positive semi-definite matrix H of rank s has 
eigenvalue decomposition 


H 


r 

r$ 2 oil 


' N r 

N N± 

1 Si ' 

1 Si 

_ 1 


N± 


( 9 ) 


with the s x s diagonal matrix <f> 2 positive definite, then the nxp matrix F satisfies F'F = H 
if and only if F has singular value decomposition F = T&N', with some n x s matrix T with 
T'T = I. 

Proof: Write F in the form 


F 


r 

~ N'~ 

A B 

_ N '± 


( 10 ) 


3 



Then F'F = H if and only if 


A 1 A 

A'B~ 


'$2 

o ' 

B'A 

B'B 


0 

0 


Thus F'F = H if and only if B = 0 and A!A = $ 2 , which can be written as A = T$ with 
T'T = L QED 

Theorem 2: [Least_Squares] Suppose X is an n x m matrix, S is an m x p matrix , and 

FI is a positive semi-definite pxp matrix of rank s with eigenvalue decomposition (9). Define 
the n x s matrix R = XSNQ and suppose 


R 


P P± 


b cl 

Q r 

o 1 

O 
-1 

Q'± 


( 11 ) 


is a singular value decomposition of R and rank R is r. The minimum of tr (X—FS') 1 (X — FS') 
over the n x p matrices F that satisfy F'F = H is 


min tr (X - FS')'(X - FS') = tr X’X + tr SHS' — 2 tr T (12) 

and the minimum is attained at any F with singular value decomposition 

F = ( PQ’ + P ± DQ' ± )$N 7 , (13) 

where the (n — r) x (s — r) matrix D satisfies D'D = / but is otherwise arbitrary. 

Proof: It follows from lemma 1 that minimizing the sum of squares is equivalent to maximizing 
tr T'R over T'T = I. Let 


T — \P P ± 

A B 

Q r 


C D 

Q± 


Then tr T'R = tr A'^ while T'T = / when 


A'A + C'C A'BP CD 


7 o' 

B'A + D'C B'B + D'D 


0 / 


The maximum tr A'^ over A 1 A < I is equal to tr T and is attained (uniquely) for A = I, 
which means that C — 0 and B = 0. Thus 

T = PQ' + P ± DQ' ± , 

and F has singular value decomposition 

F = ( PQ' + P l DQ’ l )<S>N' 


where D'D = I. QED 
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4 Fundamental Theorem 

Theorem 3: [Fundamental] 

1. If X G M nxm , S G W mxp and F G M nxp and H G W xp satisfy X = FS' and F'F = H 
then X’X = SHS'. 

2. If X G M nxm , S G M. mxp and H G M pxp satisfy X'X = SHS' then there is an F G R nxp 
such that X = FS' and F'F = H. 

Proof: Part 1 is just a restatement of theorem 1. To prove part 2, we show that the 
minimum least- squares loss is zero if and only if X'X = SHS'. If X'X = SHS' then 
RR' = XSHS'X' = (XX') 2 . Thus the singular values of R are the same as the singular 
values of X'X and those of SHS', and from (12) we find that minimum loss is zero. Conversely, 
if minimum loss is zero there is an F with X = FS' and F'F = H and thus X'X = SHS'. 
QED 

5 Quantifying Indeterminacy 

For two different choices Di and D 2 of D in (13) we have 

^?~ 1 N'F(F 2 N^~ 1 = PP' + P ± D\D 2 P' ± . 

Thus the canonical correlations of F\ and F 2 are the s — r singular values of D[D 2 , in addition 
to r canonical correlations equal to one. Because D[D 2 > — I we see that 

(1>- 1 N'F' 1 F 2 N<1>- 1 > PP' - P ± P' ± = I - 2P ± P' ± = 2 PP' - I, 

and thus tr Q~ 1 N'F(F 2 > 2 r — s, a result due to Schonemann (1971) in classical 

exploratory factor analysis. In that case we have p = s = m + c, with c the number of 
common factors, and r = m. Thus 2 r — s = m — c and the average correlation between 
corresponding factors in two solutions is (1 — c/m)/( 1 + c/m). 

6 Example 

Our example is completely fictional, but it does illustrate some of the computations, even in 
the case where there are singularities. In factor analytic terminology, it has some correlated 
common factors and some correlated unique factors. The matris S is 


## 


[,1] 

[, 2] 

[, 3] 

[,4] 

[, 5] 

C, 6] 

[,7] 

[, 8] 

C, 9] 

[,10] 

## 

[1J 

1 

0 

1 

0 

1 

0 

0 

0 

0 

0 

## 

[2,] 

2 

0 

0 

0 

0 

2 

0 

0 

0 

0 

## 

[3,] 

3 

0 

0 

0 

0 

0 

3 

0 

0 

0 

## 

[4,] 

0 

-1 

0 

0 

0 

0 

0 

3 

0 

0 

## 

[5,] 

0 

-2 

0 

0 

0 

0 

0 

0 

2 

0 

## 

[6,] 

0 

-3 

0 

1 

0 

0 

0 

0 

0 

1 

and the 

matrix 

H is 
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## 


[,1] 

[, 2] 

[, 3] 

[,4] 

[, 5] 

[, 6] 

[,7] 

[, 8] 

[, 9] 

[, 10] 

## 

[1J 

1 

-1 

0 

0 

0 

0 

0 

0 

0 

0 

## 

[2,] 

-1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

## 

[3,] 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

## 

[4,] 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

## 

[5,] 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

## 

[6,] 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

## 

[7,] 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

## 

[8,] 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

## 

[9,] 

0 

0 

0 

0 

0 

0 

0 

0 

1 

-1 

## 

1-1 

o 

1_1 

0 

0 

0 

0 

0 

0 

0 

0 

-1 

1 


Matrix S has rank 6 and H has rank 8 (two eigenvalues equal to two, six eigenvalues equal 
to one, two eigenvalues equal to zero). Eigenvalue decomposition of H defines N and <f>. 

Thus SHS' is 


[,1] t,2] [ ,3] [ ,4] [, 5] [, 6] 
3 12 3 


## 

## [1J 
## [ 2 ,] 
## [3,] 
## [4,] 
## [5,] 
## [ 6 ,] 


3 2 

2 8 

3 6 

1 2 

2 4 

3 6 


6 2 

18 3 

3 10 

6 2 

9 3 


4 6 

6 9 

2 3 

8 4 

4 11 


with a trace equal to 58. We fill a 30 x 6 matrix A" with random normal deviates and compute 
R = X SN$. 


The sum of squares of A is 220.2584815037, and the trace norm of R (sum of singular values) 
is 100.3767382715, which means minimum least squares loss is 77.5050049607. 

We can now use (13) to construct F. Choose the 24 x 2 matrix D by 

d <- matrix (0, 24, 2) 
diag (d) <- 1 


The corresponding F is 

Using the sup-norm we find ||H — F'F\\^ equal to 1.4433e-15 and tr(A — FS')'(X — FS') 
equal to 77.5050049607. 

Now we cheat a bit and simply make an X such that A"'A" = SHS 1 . We use a 30 x 30 matrix 
of standard normals, orthonormalize it with qr(), and take the first 6 columns as K and the 
last 24 as K±. Then A and L are taken from the eigenvalue decomposition of SHS 1 . 

The sum of squares of X and the trace norm of R are both 58, which means minimum 
least squares loss is zero. With the same D as before we use (13) to construct F. Again 
||FI — F'F Hoc is equal to 6.6613e-16 and now ||X — FS"|| 00 is equal to 1.3212e-14. 
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7 And Now ... For Our Next Act 


In the second part of this paper we will discuss an algorithm in R that minimizes (8) over 
S G S, H G F and F with F'F = H. This is a one-step algorithm, in the sense that we 
construct S, FI and F simultaneously. Clearly choice of S and H is critical here. In fact, we 
will modify the problem somewhat so that we minimize 

a(F, H, S ) = tr(X - FHSJ(X - FHS'), 

where we require F'F = I and H G PL. Both S and PL are defined by elementwise box 
constraints, which include the extreme cases of no constraint and constrained to be a fixed 
real number. 
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